Skip to main content

All Questions

0votes
0answers
23views

How to use cross validation to select/evaluate model with probability score as the output?

Initially I was evaluating my models using cross_val with out-of-pocket metrics such as precision, recall, f1 score, etc, or with my own metrics defined in ...
szheng's user avatar
0votes
0answers
87views

XGBoost Classifier Evaluation Confusion on New Dataset Despite High Cross-Validation Scores

I have built an XGBoost classifier model with 90 features, trained on a dataset containing 760k samples. I took great care to separate the labels from the features in both the training and testing ...
oklen's user avatar
0votes
3answers
877views

For cross validation should I use training set, or whole dataset?

I'm new to data science and I have a problem understanding what dataset to use when using cross validation for model evaluation. Let's say I have two models: LogisticRegression and ...
Michał Jurzak's user avatar
0votes
1answer
747views

Can I use GridSearchCV.best_score_ for evaluation of model performance?

Scikit-learn page on Grid Search says: Model selection by evaluating various parameter settings can be seen as a way to use the labeled data to “train” the parameters of the grid. When evaluating the ...
Charlie's user avatar
0votes
1answer
116views

How do I know If my regression model is underfitting?

How do we evaluate the performance of a regression model with a certain RMSE given that a domain knowledge performance metric is not present? Maybe MAPE is one way of comparing the performance of my ...
Mehmet Deniz's user avatar
1vote
0answers
306views

Grouped stratified train-val-test split for a multilabel dataset

So this is indeed nontrivial. I was wondering if there is a fast heuristic algorithm for performing grouped stratified dataset split on a multilabel dataset. Stratification is usually performed to ...
jasperhyp's user avatar
1vote
0answers
633views

How does exactly eval_set and RandomizedSearchCV work for LightGBM?

How does RandomizedSearchCV form the validation sets, while I also defined an evaluation set for LGBM? Is it formed from the train set I gave or how does the evaluation set comes into the validation? ...
morqueatsz's user avatar
0votes
0answers
52views

How to evaluate model accuracy at tail of empirical distribution?

I am making a nonlinear regression on stationary dependent variable and I want to precisely forecast extreme values of this variable. So when my model predicts extreme values I want them to be highly ...
Łukasz Czop's user avatar
0votes
2answers
320views

I am attempting to implement k-folds cross validation in python3. What is the best way to implement this? Is it preferable to use Pandas or Numpy? [closed]

I am attempting to create a script to implement cross validation in data. However, the splits cannot randomly take any records, so the training and testing can be done on equal data splits for each ...
AGX301's user avatar
1vote
1answer
725views

n_jobs=-1 or n_jobs=1?

I am confused regarding the n_jobs parameter used in some models and for CV. I know it is used for parallel computing, where it includes the number of processors specified in n_jobs parameter. So if I ...
spectre's user avatar
  • 2,203
1vote
0answers
126views

Imbalanced dataset, finding the statistical significance of a Matthews Correlation Coefficient (MCC) in binary classification (what is a good MCC)?

I have a very imbalanced dataset. Thus, I am using MCC to evaluate the performance of various ML algorithms. It appears that literature is entirely lacking in ways to evaluate how good an MCC score is....
Prospero's user avatar
0votes
1answer
2kviews

Machine Learning validation data returns 100% accuracy [closed]

I'm Testing a Machine Learning model with validation data returns that return 100% correct answers, is it overfitting or the model works extremely well, do I need to continue training on more data? I'...
MXK's user avatar
  • 184
0votes
1answer
379views

Difference between validation and prediction

As a follow-up to Validate via predict() or via fit()? I wonder about the difference between validation and prediction. To keep it simple, I will refer to train, <...
Ben's user avatar
  • 570
1vote
1answer
115views

Validity of cross-validation for model performance estimation

When applying cross-validation for estimating the performance of a predictive model, the reported performance is usually the average performance over all the validation folds. As during this procedure,...
C.S.'s user avatar
5votes
3answers
7kviews

In k-fold-cross-validation, why do we compute the mean of the metric of each fold

In k-fold-cross-validation, the "correct" scheme seem to compute the metric (say the accuracy) for each fold, and then return the mean as the final metric. Source : https://scikit-learn.org/stable/...
Alexis Pister's user avatar

153050per page
close